-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(parser): merge lexer with parser & optimize parser #679
Conversation
3360a81
to
51f7956
Compare
I like this, especially binary expressions parsing, it's clever (though I'd say old parser is easier to digest), and tests look nice. Checked out and tested on one of my projects. 1 or 2 places to pick for some more gains. It's @jdanyow @EisenbergEffect to decide what to do with |
@bigopon Thanks for taking a look!
Which ones? I know numeric and unicode parsing can still be optimized quite a bit by processing the character code points and performing direct conversion, rather than using |
awesome "refactoring" |
Updated number scanning now, for normal integers it will calculate them instead of parse which is about 15x faster. For floats there's no difference. Also removed the offset thing from |
My takeaway from the feedback so far is that it's a nice idea but the readability could be improved, so with the latest 2 commits I cleaned things up a bit.
And a few more misc things where I reuse functions instead of inlining things. The inlining didn't give nearly as big a performance boost as the switch from string tokens to bitwise tokens. Runtime of the mini benchmark went from ~116ms to ~124ms (compared to ~360ms without the changes) so I'd argue a few % performance loss is probably worth the gain in readability/maintainability. Regarding splitting the parser/lexer, I kind of had to throw everything on one big pile first so I could see the whole picture. Main point was to parse while scanning (instead of scanning everything and then parsing) as this saves some memory and array operations. |
Side note, the runtime could probably also be sped up a little bit by using the numeric tokens instead of strings in the |
I'm ok with having it as one class or splitting it. No real hard opinion from me. It it's easier to maintain together, then I think it's fine to keep it as one. I'd like to get another pass from @bigopon on this and maybe some feedback from @jdanyow as well, to make sure there aren't any missed edge cases that for some reason weren't covered under existing tests. I'm pretty excited about this :) |
This looks awesome- nice work 💪 🚀 My only feedback is around the benchmark- I'd like to see the test cases skewed more towards typical binding use cases, just to confirm we don't have a regression there (I doubt we will). Bindings using AccessScope and AccessMember expressions outnumber other types of expressions by a large factor in typical apps. Could you update the benchmark link to use at least 4 times as many simple expressions like |
@jdanyow Good point, I split up the individual parsing measurements so you can see where the differences are. I added a weight factor just in case, so the Access expressions make up the majority of all parses. I noticed that my version seems to benefit a lot from specifically the way Chrome optimizes hot paths. For the first run (page reload -> press test) the results fluctuate between 90% and 140%. There's a regression in parsing floats but that's to be expected - this is a tradeoff for speeding up parsing integers (which I think are far more common) For subsequent runs (press test -> press test -> press test) the results stabilize around 180%. Doesn't really help with initial page loads, but would be interesting to see if things can be optimized beforehand with SSR / AOT by hinting the browser on where the hot paths are. Relative results in Edge and FF are somewhat similar for first runs, though Edge is about 3x slower overall and FF about 2x slower. I guess it's mostly array access and switch statement related. Chrome really optimized those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
args.push(this.parseExpression()); | ||
} | ||
|
||
result = new ValueConverter(result, name, args, [result].concat(args)); | ||
result = new ValueConverter(result, name, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
src/parser.js
Outdated
constructor() { | ||
this.cache = {}; | ||
this.lexer = new Lexer(); | ||
this.cache = Object.create(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@jdanyow Unicode shouldn't be an issue as long as we stick to BMP. Without having to deal with surrogate pairs, it can be as simple as a lookup table (with all the valid identifier code points) in the |
How big is the lookup table?
…Sent from my iPhone
On Apr 28, 2018, at 6:19 PM, fkleuver ***@***.***> wrote:
@jdanyow Unicode shouldn't be an issue as long as we stick to BMP. Without having to deal with surrogate pairs, it can be as simple as a lookup table (with all the valid identifier code points) in the default switch case in scanToken(). A lookup table can be generated based on unicode-10.0.0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@EisenbergEffect Approximately 10kb for identifier start and identifier part (compressed) |
I would prefer not to add something like that to the core. Do you think there is some way that it could hook in as a plugin for people who really want it? |
Sure. Could add something like a It doesn't necessarily have to be a standalone plugin btw. As long as the lookup is in a separate file, the build script could output an additional |
The trick is that it needs to work with all build types, including require.js with our current CLI. |
Hm, took a look at it and having an alternative I added an initial idea to do this via a simple overridable extension point. Performance impact for the default implementation is negligible as it only does one extra call on the last character of an identifier. Users could provide their own implementation and register it as I'm not 100% sure if this is the best name and/or form of encapsulation for the 2 methods. I was thinking it could be better to have a more generic Thinking here of things like auto object creation (#205), debug logging (#30), globally adding a value converter (don't recall the issue). What do you guys think? Have a single Or to keep the iteration size manageable, leave unicode out of scope for this PR and do that separately. |
For sure- out of the scope of this PR, just wanted your thoughts on it. Thanks! |
Alright, I removed that last commit. To tie up any remaining loose ends, there are still 3
(edit: just realized literals and indexed properties are already covered by |
…object literals explicit
Alright, with the latest commit the todo's are gone. Mind double checking that I addressed them in a desirable manner? Other than that I think I'm done :) |
last two commits look good. I think the two kasperl todos were holdovers from the original angular dart source for the lexer/parser/ast. |
Cool cheers. Let me know if you need anything else for ze merge. |
For extension, we can apply the same strategy like Unparser. But will be opt in instead of opt out |
Damn hardcore! Just want to provide feedback that it all worked in my app. |
Nice, thanks! |
Will the performance improvement be noticeable in business applications? |
@stsje Probably not. The framework is already hardly noticeable - most time is spent rendering, waiting for the network or executing app logic. Parsing makes up less than 1% of total execution time during the initial load (as do most other processes within Aurelia). The biggest performance improvements are in the hands of app developers. It can make a difference when you're doing a lot of dynamically generating html and compiling it on-the-fly and/or using the computed observation adapter on many getters with large function bodies. That's rare though and definitely not the goal here. Firstly I see this as a proof of concept for applying techniques that may be able to speed up other parts of the framework, slightly decrease initial load times, perform better in benchmarks and so on. Overall it's in the spirit of Aurelia to try and push performance as much as possible, this is just one small step. Secondly I hope this opens the door for implementing a larger portion of the JavaScript syntax for binding to do more interesting things - with this performance gain, some room is created some to do other stuff which would otherwise not be possible without a performance loss. |
@fkleuver I was getting ready to merge this- noticed a small issue with one of the patterns used in the parser tests that have try-catch logic- I think these changes need to be made to each try-catch style test: describe('does not parse LiteralObject with computed property', () => {
const expressions = [
'{ []: "foo" }',
'{ [42]: "foo" }',
'{ ["foo"]: "bar" }',
'{ [foo]: "bar" }'
];
for (const expr of expressions) {z
it(expr, () => {
+ let error = null;
try {
parser.parse(expr);
- pass = true;
} catch (e) {
- expect(e.message).toContain('Unexpected token [');
+ error = e;
}
+ expect(error).not.toBeNull();
+ expect(error.message).toContain('Unexpected token [');
});
}
}); otherwise if the parse doesn't throw as expected, the test will still pass. |
@jdanyow Glad you caught that, apparently there were a couple duplicate tests in there as well and two little issues with the invalid assignment tests. All should be sorted now. |
nice work @fkleuver! |
awesome !!!!11 |
This is a rewrite of the parser that replaces all string-based token comparisons with bitwise operators to speed things up, and also combines the lexer into the parser.
Parsing binary expressions is flattened into a single recursive function that takes operator precedence into account (drastically reduces unnecessary nested calls).
Functionality should be identical afaik. Error checking/reporting might not be - still need to check various error scenarios to ensure the strictness is the same. I wanted to gather some feedback first before I spend a bunch of time on that though :)
Performance improvement of around 200% depending on the type of expressions parsed. Simple benchmark available here.
Overall file size slightly reduced (70 bytes less unminified, I guess a little more minified because I added some comments)
@jdanyow @EisenbergEffect Is this viable you think?